-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate llms.txt for our docs #3273
base: develop
Are you sure you want to change the base?
Conversation
* Add new server management and collaboration features * Add Python environment configuration guides * Add understanding of ZenML artifacts and complex use-cases * test redirect * one more * revert redirects * revert redirects * add page plcaeholder for collaborate with team * add icon * move files to the right directories * update toc with new paths * add all redirects * remove .md and README from the left pane * fix all broken links * fix more links --------- Co-authored-by: Jayesh Sharma <[email protected]> (cherry picked from commit ae73e2e)
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Adding @schustmi for visibility from the product side, but feel free to switch this out for someone more appropriate... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments, will go through this again
@@ -0,0 +1,34 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the .py
files still need the license.
jobs: | ||
check-batch: | ||
runs-on: ubuntu-latest | ||
if: ${{ github.event.workflow_run.conclusion == 'success' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so sure if I am missing something here, but it seems like the only trigger for this workflow is any push event on the release branches. I have two concerns about this:
-
The if condition here
github.event.workflow_run.conclusion == 'success'
might not trigger as the push events do not have workflow runs, thus no conclusions. -
The current trigger logic might trigger when we backport docs changes or similar things to the existing release branches. Is this the desired behaviour?
# Process OpenAI batch results | ||
python scripts/check_batch_output.py | ||
|
||
# Upload all files to HuggingFace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first half of the step here is in the check_batch_output
python script, whereas the second half is hard coded here. Why don't we merge these two? Wouldn't it be easier to manage it that way?
md_files = list(Path(docs_dir).rglob("*.md")) | ||
md_files = [file for file in md_files if file.name not in exclude_files] | ||
|
||
# delete files before docs/book/how-to/infrastructure-deployment/stack-deployment/README.md in the list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to delete the files before this specific one?
temperature=0.3, # Lower temperature for more focused summaries | ||
max_output_tokens=2000, | ||
safety_settings=[ | ||
types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH", threshold="OFF"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did we set the threshold to off?
Describe changes
I implemented scripts that can generate llms.txt files for our documentation. llms.txt is a proposal to have files that can be fed to general purpose LLMs to have them answer questions on specific tools.
We will have a bunch of files, instead of one, since the full version is huge and not ideal when talking to models with limited context windows.
The PR adds the generation to the CI, with the batch job for summarizing the how-to sections starting when the prepare release workflow ends and the checking of the output and the uploading of the files to hugging face happening after a push to the release branch.
What's missing?
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes